32 research outputs found
Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks
How can we estimate the importance of nodes in a knowledge graph (KG)? A KG
is a multi-relational graph that has proven valuable for many tasks including
question answering and semantic search. In this paper, we present GENI, a
method for tackling the problem of estimating node importance in KGs, which
enables several downstream applications such as item recommendation and
resource allocation. While a number of approaches have been developed to
address this problem for general graphs, they do not fully utilize information
available in KGs, or lack flexibility needed to model complex relationship
between entities and their importance. To address these limitations, we explore
supervised machine learning algorithms. In particular, building upon recent
advancement of graph neural networks (GNNs), we develop GENI, a GNN-based
method designed to deal with distinctive challenges involved with predicting
node importance in KGs. Our method performs an aggregation of importance scores
instead of aggregating node embeddings via predicate-aware attention mechanism
and flexible centrality adjustment. In our evaluation of GENI and existing
methods on predicting node importance in real-world KGs with different
characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.Comment: KDD 2019 Research Track. 11 pages. Changelog: Type 3 font removed,
and minor updates made in the Appendix (v2
MetaGL: Evaluation-Free Selection of Graph Learning Models via Meta-Learning
Given a graph learning task, such as link prediction, on a new graph, how can
we select the best method as well as its hyperparameters (collectively called a
model) without having to train or evaluate any model on the new graph? Model
selection for graph learning has been largely ad hoc. A typical approach has
been to apply popular methods to new datasets, but this is often suboptimal. On
the other hand, systematically comparing models on the new graph quickly
becomes too costly, or even impractical. In this work, we develop the first
meta-learning approach for evaluation-free graph learning model selection,
called MetaGL, which utilizes the prior performances of existing methods on
various benchmark graph datasets to automatically select an effective model for
the new graph, without any model training or evaluations. To quantify
similarities across a wide variety of graphs, we introduce specialized
meta-graph features that capture the structural characteristics of a graph.
Then we design G-M network, which represents the relations among graphs and
models, and develop a graph-based meta-learner operating on this G-M network,
which estimates the relevance of each model to different graphs. Extensive
experiments show that using MetaGL to select a model for the new graph greatly
outperforms several existing meta-learning techniques tailored for graph
learning model selection (up to 47% better), while being extremely fast at test
time (~1 sec).Comment: ICLR 202
MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals
Given multiple input signals, how can we infer node importance in a knowledge
graph (KG)? Node importance estimation is a crucial and challenging task that
can benefit a lot of applications including recommendation, search, and query
disambiguation. A key challenge towards this goal is how to effectively use
input from different sources. On the one hand, a KG is a rich source of
information, with multiple types of nodes and edges. On the other hand, there
are external input signals, such as the number of votes or pageviews, which can
directly tell us about the importance of entities in a KG. While several
methods have been developed to tackle this problem, their use of these external
signals has been limited as they are not designed to consider multiple signals
simultaneously. In this paper, we develop an end-to-end model MultiImport,
which infers latent node importance from multiple, potentially overlapping,
input signals. MultiImport is a latent variable model that captures the
relation between node importance and input signals, and effectively learns from
multiple signals with potential conflicts. Also, MultiImport provides an
effective estimator based on attentive graph neural networks. We ran
experiments on real-world KGs to show that MultiImport handles several
challenges involved with inferring node importance from multiple input signals,
and consistently outperforms existing methods, achieving up to 23.7% higher
NDCG@100 than the state-of-the-art method.Comment: KDD 2020 Research Track. 10 page
CGC: Contrastive Graph Clustering for Community Detection and Tracking
Given entities and their interactions in the web data, which may have
occurred at different time, how can we find communities of entities and track
their evolution? In this paper, we approach this important task from graph
clustering perspective. Recently, state-of-the-art clustering performance in
various domains has been achieved by deep clustering methods. Especially, deep
graph clustering (DGC) methods have successfully extended deep clustering to
graph-structured data by learning node representations and cluster assignments
in a joint optimization framework. Despite some differences in modeling choices
(e.g., encoder architectures), existing DGC methods are mainly based on
autoencoders and use the same clustering objective with relatively minor
adaptations. Also, while many real-world graphs are dynamic, previous DGC
methods considered only static graphs. In this work, we develop CGC, a novel
end-to-end framework for graph clustering, which fundamentally differs from
existing methods. CGC learns node embeddings and cluster assignments in a
contrastive graph learning framework, where positive and negative samples are
carefully selected in a multi-level scheme such that they reflect hierarchical
community structures and network homophily. Also, we extend CGC for
time-evolving data, where temporal graph clustering is performed in an
incremental learning fashion, with the ability to detect change points.
Extensive evaluation on real-world graphs demonstrates that the proposed CGC
consistently outperforms existing methods.Comment: TheWebConf 2022 Research Trac
Fairness-Aware Graph Neural Networks: A Survey
Graph Neural Networks (GNNs) have become increasingly important due to their
representational power and state-of-the-art predictive performance on many
fundamental learning tasks. Despite this success, GNNs suffer from fairness
issues that arise as a result of the underlying graph data and the fundamental
aggregation mechanism that lies at the heart of the large class of GNN models.
In this article, we examine and categorize fairness techniques for improving
the fairness of GNNs. Previous work on fair GNN models and techniques are
discussed in terms of whether they focus on improving fairness during a
preprocessing step, during training, or in a post-processing phase.
Furthermore, we discuss how such techniques can be used together whenever
appropriate, and highlight the advantages and intuition as well. We also
introduce an intuitive taxonomy for fairness evaluation metrics including
graph-level fairness, neighborhood-level fairness, embedding-level fairness,
and prediction-level fairness metrics. In addition, graph datasets that are
useful for benchmarking the fairness of GNN models are summarized succinctly.
Finally, we highlight key open problems and challenges that remain to be
addressed
Fast and scalable method for distributed Boolean tensor factorization
How can we analyze tensors that are composed of 0's and 1's? How can we efficiently analyze such Boolean tensors with millions or even billions of entries? Boolean tensors often represent relationship, membership, or occurrences of events such as subject-relation-object tuples in knowledge base data (e.g., 'Seoul'-'is the capital of'-'South Korea'). Boolean tensor factorization (BTF) is a useful tool for analyzing binary tensors to discover latent factors from them. Furthermore, BTF is known to produce more interpretable and sparser results than normal factorization methods. Although several BTF algorithms exist, they do not scale up for large-scale Boolean tensors. In this paper, we propose DBTF, a distributed method for Boolean CP (DBTF-CP) and Tucker (DBTF-TK) factorizations running on the Apache Spark framework. By distributed data generation with minimal network transfer, exploiting the characteristics of Boolean operations, and with careful partitioning, DBTF successfully tackles the high computational costs and minimizes the intermediate data. Experimental results show that DBTF-CP decomposes up to 16(3)-32(3) x larger tensors than existing methods in 82-180 x less time, and DBTF- TK decomposes up to 8(3)-16(3) x larger tensors than existing methods in 86-129 x less time. Furthermore, both DBTF- CP and DBTF- TK exhibit near- linear scalability in terms of tensor dimensionality, density, rank, and machines.N